54 research outputs found
Benchmarking optimization algorithms for auto-tuning GPU kernels
Recent years have witnessed phenomenal growth in the application, and
capabilities of Graphical Processing Units (GPUs) due to their high parallel
computation power at relatively low cost. However, writing a computationally
efficient GPU program (kernel) is challenging, and generally only certain
specific kernel configurations lead to significant increases in performance.
Auto-tuning is the process of automatically optimizing software for
highly-efficient execution on a target hardware platform. Auto-tuning is
particularly useful for GPU programming, as a single kernel requires re-tuning
after code changes, for different input data, and for different architectures.
However, the discrete, and non-convex nature of the search space creates a
challenging optimization problem. In this work, we investigate which algorithm
produces the fastest kernels if the time-budget for the tuning task is varied.
We conduct a survey by performing experiments on 26 different kernel spaces,
from 9 different GPUs, for 16 different evolutionary black-box optimization
algorithms. We then analyze these results and introduce a novel metric based on
the PageRank centrality concept as a tool for gaining insight into the
difficulty of the optimization problem. We demonstrate that our metric
correlates strongly with observed tuning performance.Comment: in IEEE Transactions on Evolutionary Computation, 202
Deep data compression for approximate ultrasonic image formation
In many ultrasonic imaging systems, data acquisition and image formation are
performed on separate computing devices. Data transmission is becoming a
bottleneck, thus, efficient data compression is essential. Compression rates
can be improved by considering the fact that many image formation methods rely
on approximations of wave-matter interactions, and only use the corresponding
part of the data. Tailored data compression could exploit this, but extracting
the useful part of the data efficiently is not always trivial. In this work, we
tackle this problem using deep neural networks, optimized to preserve the image
quality of a particular image formation method. The Delay-And-Sum (DAS)
algorithm is examined which is used in reflectivity-based ultrasonic imaging.
We propose a novel encoder-decoder architecture with vector quantization and
formulate image formation as a network layer for end-to-end training.
Experiments demonstrate that our proposed data compression tailored for a
specific image formation method obtains significantly better results as opposed
to compression agnostic to subsequent imaging. We maintain high image quality
at much higher compression rates than the theoretical lossless compression rate
derived from the rank of the linear imaging operator. This demonstrates the
great potential of deep ultrasonic data compression tailored for a specific
image formation method.Comment: IEEE International Ultrasonics Symposium 202
Geometric reconstruction methods for electron tomography
Electron tomography is becoming an increasingly important tool in materials
science for studying the three-dimensional morphologies and chemical
compositions of nanostructures. The image quality obtained by many current
algorithms is seriously affected by the problems of missing wedge artefacts and
nonlinear projection intensities due to diffraction effects. The former refers
to the fact that data cannot be acquired over the full tilt range;
the latter implies that for some orientations, crystalline structures can show
strong contrast changes. To overcome these problems we introduce and discuss
several algorithms from the mathematical fields of geometric and discrete
tomography. The algorithms incorporate geometric prior knowledge (mainly
convexity and homogeneity), which also in principle considerably reduces the
number of tilt angles required. Results are discussed for the reconstruction of
an InAs nanowire
A tomographic workflow to enable deep learning for X-ray based foreign object detection
Detection of unwanted (‘foreign’) objects within products is a common procedure in many branches of industry for maintaining production quality. X-ray imaging is a fast, non-invasive and widely applicable method for foreign object detection. Deep learning has recently emerged as a powerful approach for recognizing patterns in radiographs (i.e., X-ray images), enabling automated X-ray based foreign object detection. However, these methods require a large number of training examples and manual annotation of these examples is a subjective and laborious task. In this work, we propose a Computed Tomography (CT) based method for producing training data for supervised learning of foreign object detection, with minimal labor requirements. In our approach, a few representative objects are CT scanned and reconstructed in 3D. The radiographs that are acquired as part of the CT-scan data serve as input for the machine learning method. High-quality ground truth locations of the foreign objects are obtained through accurate 3D reconstructions and segmentations. Using these segmented volumes, corresponding 2D segmentations are obtained by creating virtual projections. We outline the benefits of objectively and reproducibly generating training data in this way. In addition, we show how the accuracy depends on the number of objects used for the CT reconstructions. The results show that in this workflow generally only a relatively small number of representative objects (i.e., fewer than 10) are needed to achieve adequate detection performance in an industrial setting
- …